Patrick McCaslin, patrick.mccaslin@thebaltimorebanner.com, June 2, 2025
This story looks into the moderate, positive relationship between per capita income and the local public school’s performance on Maryland’s state test, the MCAP. It will discuss the effect of The Blueprint for Maryland’s Future on ameliorating this issue.
State test scores have largely stagnated over the past few years, regardless of race and location despite the state investing millions of dollars into schools. For low-scoring districts, this indicates that schools are struggling to catch up and income could help illuminate the why. Especially now, the economy is an issue that is particularly relevant to Americans and showing its effect on educational outcomes could add to the picture. Additionally, the Blueprint has provisions for helping districts in poverty that have recently been called into question. This will be relevant as the state approaches its gubernatorial election and other local elections.
In early April, Maryland legislators concluded a standoff with Gov. Wes Moore surrounding the Blueprint. They gutted Moore’s bill to defund the Blueprint over the course of four years. Among other requests, it specifically called to defund the provision that increased funding for students who live in areas with concentrated poverty.
Understanding income’s effect on the performance of schools may provide important context to understanding the potential benefit of the Blueprint.
Primarily parents of students in public schools. Secondarily, politically-minded readers who may have a strong opinion about allocating such a large part of the budget to the Blueprint.
A demonstrated positive correlation between income and MCAP test score may influence voters’ opinions on the Blueprint and affect how the state allocates money in the future. It could also illuminate an important underlying factor in educational success, especially if it seems to affect certain townships more than others.
Comparing the local per capita income with the percentage of the school that scores proficient reveals only a moderate correlation. Mathematically, this means income can explain ~25% of the variability in percentage scoring proficient.
Comparing the local per capita income with the change in the percentage of the school that scores proficient reveals little correlation. Mathematically, this means income can explain ~25% of the variability in percentage scoring proficient.
Originally, I was going to complete a geospatial analysis for a more accurate picture of the income vs scores relationship. However, now I’m wondering if I should focus the story on the surprisingly small effect it has on test scores?
Or, if I should completely pivot to other factors such as race or geography. Perhaps stratifying these results by county would reveal different impacts depending on the county? I noted stronger results in Montgomery County for instance.
#install.packages("tidyverse")
#install.packages("janitor")
#install.packages("tidycensus")
#install.packages("stringr")
library(tidycensus)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(stringr)
data_cleaner <- function(df) {
#If *, delete from df.
df <- df |>
filter(Proficient.Pct != "*")
#Convert strings stored in Proficient.Pct to doubles. When <5, convert to 5. When >95 convert to 95. Pasted and modified from ChatGPT.
df <- df |>
mutate(
Proficient.Pct = case_when( #new Proficient.Pct
Proficient.Pct == "<= 5.0" ~ 5L,
Proficient.Pct == ">= 95.0" ~ 95L,
suppressWarnings(!is.na(as.numeric(Proficient.Pct))) ~ as.numeric(Proficient.Pct),
TRUE ~ 0
)
)
df <- df |>
filter(str_ends(str_trim(Assessment), fixed("All Grades"))) |>
arrange(desc(Proficient.Pct))
return(df)
}
ela_math_scores_2019 <- read.csv("../MCAP Scores/MCAP_ELA_MATH_2019.csv")
ela_scores_2021 <- read.csv("../MCAP Scores/2021_MCAP_ELA_Scores.csv")
math_scores_2021 <- read.csv("../MCAP Scores/2021_MCAP_MATH_Scores.csv")
ela_scores_2023 <- read.csv("../MCAP Scores/2023_MCAP_ELA_Scores.csv")
math_scores_2023 <- read.csv("../MCAP Scores/2023_MCAP_MATH_Scores.csv")
# ela_scores_2024 <- read.csv("../MCAP Scores/2024_MCAP_ELA_Scores.csv")
#
# math_scores_2024 <- read.csv("MCAP Scores/2024_MCAP_MATH_Scores.csv")
math_scores_2023 <- data_cleaner(math_scores_2023)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Proficient.Pct = case_when(...)`.
## Caused by warning:
## ! NAs introduced by coercion
ela_scores_2023 <- data_cleaner(ela_scores_2023)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Proficient.Pct = case_when(...)`.
## Caused by warning:
## ! NAs introduced by coercion
#Gets per capita income data by zip code for the entire U.S. based off of 2023 ACS
percap_national_income2023 <- get_acs(geography = "zcta",
variables = "B19301_001",
year = 2023) |>
rename(Income_per_cap = estimate)
## Getting data from the 2019-2023 5-year ACS
#Read in df which relates all schools to zip codes.
schools_zips <- read.csv("../schools_zips.csv")
#Filter to only contain zip codes that are found on the list of zip codes, instead of the entire U.S.
zip_incomes <- percap_national_income2023 |>
filter(GEOID %in% schools_zips$ZIP)
#Now join one to many. We have a df () which contains school names and zip codes. We have another df () which contains zip codes and income.
#First change column name from GEOID to ZIP to allow for join.
zip_incomes <- zip_incomes |> rename(ZIP = GEOID)
#And join! Now we have schools and income connected
schools_income <- left_join(schools_zips, zip_incomes, by = "ZIP")
normalize <- function(name) {
name |>
# Move parentheses content to the front, e.g., "Whitman (Walt) High" -> "Walt Whitman High" Taken from ChatGPT.
str_replace("^(.+?)\\s*\\((.+?)\\)(.*)$", "\\2 \\1\\3") |>
# Remove anything starting with ' #' and after (e.g., "# 144")
str_replace("\\s+#.*$", "") |>
str_squish()
}
schools_income$School.Name <- sapply(schools_income$SCHOOL_NAME, normalize)
#Join income with test scores
income_ela_scores <- ela_scores_2023 |>
inner_join(schools_income, by = "School.Name")
## Warning in inner_join(ela_scores_2023, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 513 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
income_math_scores <- math_scores_2023 |>
inner_join(schools_income, by = "School.Name")
## Warning in inner_join(math_scores_2023, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 38 of `x` matches multiple rows in `y`.
## ℹ Row 193 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
no_outliers <- income_ela_scores |>
filter(income_ela_scores$Income_per_cap < 80000)
#comes out to 0.477671
income_ela_scores <- income_ela_scores |>
filter(!is.na(income_ela_scores$Income_per_cap))
cor(income_ela_scores$Income_per_cap, income_ela_scores$Proficient.Pct)
## [1] 0.5299482
#comes out to 0.529948
no_outliers_math <- income_math_scores |>
filter(income_math_scores$Income_per_cap < 80000)
cor(no_outliers_math$Income_per_cap, no_outliers_math$Proficient.Pct)
## [1] 0.3920974
#comes out to 0.392
income_math_scores <- income_math_scores |>
filter(!is.na(income_math_scores$Income_per_cap))
cor(income_math_scores$Income_per_cap, income_math_scores$Proficient.Pct)
## [1] 0.4711466
#comes out to 0.4711
cor(no_outliers$Income_per_cap, no_outliers$Proficient.Pct)
## [1] 0.477671
difference_calculator <- function(df,df2) {
df <- clean_names(df)
df <- df |>
mutate(
proficient_pct = suppressWarnings(as.numeric(proficient_pct)),
tested_count = suppressWarnings(as.numeric(tested_count))
) |>
filter(!is.na(proficient_pct), !is.na(tested_count))
df <- df %>%
mutate(assessment = case_when(
str_detect(assessment, "Mathematics|Algebra|Geometry") ~ "Math",
str_detect(assessment, "English") ~ "English",
TRUE ~ "Other"
)) |>
group_by(school_name, assessment) |>
summarise(
Weighted_Proficient = sum(proficient_pct * tested_count, na.rm = TRUE) / sum(tested_count, na.rm = TRUE),
.groups = "drop"
)
#rename for join later
df <- df |>
rename(School.Name = school_name)
#join
df_income_scores <- df |>
left_join(df2, by = "School.Name")
#calculate difference
df_income_scores$diference <- df_income_scores$Proficient.Pct - df_income_scores$Weighted_Proficient
#filter NAs
df_income_scores <- df_income_scores |>
filter(!is.na(df_income_scores$diference))
return(df_income_scores)
}
a <- difference_calculator(ela_scores_2021,ela_scores_2023)
b <- difference_calculator(math_scores_2021,math_scores_2023)
a <- a |>
inner_join(schools_income, by = "School.Name")
## Warning in inner_join(a, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 66 of `x` matches multiple rows in `y`.
## ℹ Row 735 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
b <- b |>
inner_join(schools_income, by = "School.Name")
## Warning in inner_join(b, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 61 of `x` matches multiple rows in `y`.
## ℹ Row 735 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
ela_math_scores_2019 <- ela_math_scores_2019 %>%
mutate(assessment = case_when(
str_detect(Assessment, "Mathematics|Algebra|Geometry") ~ "Math",
str_detect(Assessment, "English") ~ "English",
TRUE ~ "Other"
))
ela_math_scores_2019 <- clean_names(ela_math_scores_2019)
ela_math_scores_2019 <- ela_math_scores_2019 %>%
mutate(
proficient_pct = suppressWarnings(as.numeric(proficient_pct)),
tested_count = suppressWarnings(as.numeric(tested_count))
) %>%
filter(!is.na(proficient_pct), !is.na(tested_count))
test <- ela_math_scores_2019 %>%
mutate(assessment = case_when(
str_detect(assessment, "Mathematics|Algebra|Geometry") ~ "Math",
str_detect(assessment, "English") ~ "English",
TRUE ~ "Other"
)) |>
group_by(school_name, assessment) |>
summarise(
Weighted_Proficient = sum(proficient_pct * tested_count, na.rm = TRUE) / sum(tested_count, na.rm = TRUE),
.groups = "drop"
)
#separate by math/english
test_ela <- test |>
filter(str_detect(assessment, "English"))
test_math <- test |>
filter(str_detect(assessment, "Math"))
#rename for join later
test_ela <- test_ela |>
rename(School.Name = school_name)
test_math <- test_math |>
rename(School.Name = school_name)
#join 2019 and 2023
income_ela_scores <- test_ela |>
left_join(ela_scores_2023, by = "School.Name")
income_math_scores <- test_math |>
left_join(math_scores_2023, by = "School.Name")
#calculate difference
income_ela_scores$diference <- income_ela_scores$Proficient.Pct - income_ela_scores$Weighted_Proficient
income_math_scores$diference <- income_math_scores$Proficient.Pct - income_math_scores$Weighted_Proficient
#filter NAs
income_ela_scores <- income_ela_scores |>
filter(!is.na(income_ela_scores$diference))
income_math_scores <- income_math_scores |>
filter(!is.na(income_math_scores$diference))
ela_difference <- income_ela_scores |>
inner_join(schools_income, by = "School.Name")
## Warning in inner_join(income_ela_scores, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 62 of `x` matches multiple rows in `y`.
## ℹ Row 735 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
math_difference <- income_math_scores |>
inner_join(schools_income, by = "School.Name")
## Warning in inner_join(income_math_scores, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 58 of `x` matches multiple rows in `y`.
## ℹ Row 735 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
#Gets per capita income data by zip code for the entire U.S. based off of 2023 ACS
# percap_national_income2023 <- get_acs(geography = "zcta",
# variables = "B19301_001",
# year = 2023) |>
# rename(Income_per_cap = estimate)
#
# #Read in df which relates all schools to zip codes.
# maryland_zips <- zips$ZCTA5N
#
# #Filter to only contain zip codes that are found on the list of zip codes, instead of the entire U.S.
#
# maryland_incomes <- percap_national_income2023 |>
# filter(GEOID %in% maryland_zips)